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Abstract 

We  study  infinite  stochastic  games  played  by  two-players  over  a  fi¬ 
nite  state  space,  with  objectives  specified  by  sets  of  infinite  traces.  The 
games  are  concurrent  (players  make  moves  simultaneously  and  inde¬ 
pendently),  stochastic  (the  next  state  is  determined  by  a  probability 
distribution  that  depends  on  the  current  state  and  chosen  moves  of 
the  players)  and  infinite  (proceeds  for  infinite  number  of  rounds).  The 
analysis  of  concurrent  stochastic  games  can  be  classified  into:  quantita¬ 
tive  analysis,  analyzing  the  optimum  value  of  the  game  and  £-optimal 
strategies  that  ensure  values  within  £  of  the  optimum  value;  and  qual¬ 
itative  analysis,  analyzing  the  set  of  states  with  optimum  value  1  and 
^-optimal  strategies  for  the  states  with  optimum  value  1.  We  consider 
concurrent  games  with  tail  objectives,  i.e. ,  objectives  that  are  inde¬ 
pendent  of  the  finite-prefix  of  traces,  and  show  that  the  class  of  tail 
objectives  are  strictly  richer  than  the  w-regular  objectives.  We  develop 
new  proof  techniques  to  extend  several  properties  of  concurrent  games 
with  w-regular  objectives  to  concurrent  games  with  tail  objectives.  We 
prove  the  positive  limit-one  property  for  tail  objectives,  that  states  for 
all  concurrent  games  if  the  optimum  value  for  a  player  is  positive  for 
a  tail  objective  <f>  at  some  state,  then  there  is  a  state  where  the  op¬ 
timum  value  is  1  for  objective  <f>  for  the  player.  We  show  that  the 
strategies  for  quantitative  winning  can  be  constructed  from  witnesses 
of  strategies  for  qualitative  winning.  The  results  establish  relationship 
between  the  quantitative  and  qualitative  analysis  of  concurrent  games 

*This  research  was  supported  in  part  by  the  ONR  grant  N00014-02- 1-0671,  the  AFOSR 
MURI  grant  F49620-00- 1-0327,  and  the  NSF  ITR  grant  CCR-0225610. 
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with  tail  objectives.  We  also  show  that  the  optimum  values  of  zero- 
sum  (strictly  conflicting  objectives)  games  with  tail  objectives  can  be 
related  to  equilibrium  values  of  nonzero-sum  (not  strictly  conflicting 
objectives)  games  with  simpler  reachability  objectives.  A  consequence 
of  our  analysis  presents  a  polytime  reduction  of  the  quantitative  anal¬ 
ysis  of  tail  objectives  to  the  qualitative  analysis  for  the  sub-class  of 
one-player  stochastic  games  (Markov  decision  processes). 


1  Introduction 

Stochastic  games.  Non-cooperative  games  provide  a  natural  framework  to 
model  interactions  between  agents  [15,  16].  A  wide  class  of  games  progress 
over  time  and  in  stateful  manner,  and  the  current  game  depends  on  the  his¬ 
tory  of  interactions.  Infinite  stochastic  games  [18,  9]  are  a  natural  model  for 
such  dynamic  games.  A  stochastic  game  is  played  over  a  finite  state  space 
and  is  played  in  rounds.  In  concurrent  games,  in  each  round,  each  player 
chooses  an  action  from  a  finite  set  of  available  actions,  simultaneously  and 
independently  of  the  other  player.  The  game  proceeds  to  a  new  state  ac¬ 
cording  to  a  probabilistic  transition  relation  (stochastic  transition  matrix) 
based  on  the  current  state  and  the  joint  actions  of  the  players.  Concurrent 
games  subsume  the  simpler  class  of  turn-based  games,  where  at  every  state 
at  most  one  player  can  choose  between  multiple  actions;  and  Markov  deci¬ 
sion  processes  (MDPs),  where  only  one  player  can  choose  between  multiple 
actions  at  every  state.  In  verification  and  control  of  finite  state  reactive  sys¬ 
tems  such  games  proceed  for  infinite  rounds,  generating  a  infinite  sequence 
of  states,  called  the  outcome  of  the  game.  The  players  receive  a  payoff  based 
on  a  payoff  function  that  maps  every  outcome  to  a  real  number. 

Objectives.  Payoffs  are  generally  Borel  measurable  functions  [13].  For 
example,  the  payoff  set  for  each  player  is  a  Borel  set  Bi  in  the  Cantor 
topology  on  5^  (where  S  is  the  set  of  states),  and  player  i  gets  payoff  1  if 
the  outcome  of  the  game  is  a  member  of  Bi,  and  0  otherwise.  In  verification, 
payoff  functions  are  usually  index  sets  of  to-regular  languages.  The  w-regular 
languages  generalizes  the  classical  regular  languages  to  infinite  strings,  they 
occur  in  low  levels  of  the  Borel  hierarchy  (they  are  in  SsHlIs),  and  they  form 
a  robust  and  expressive  language  for  determining  payoffs  for  commonly  used 
specifications  [12],  The  simplest  w-regular  objectives  correspond  to  safety 
(“closed  sets”)  and  reachability  (“open  sets”)  objectives. 

Zero-sum  games,  determinacy  and  nonzero-sum  games.  Games  may 
be  zero-sum,  where  two  players  have  directly  conflicting  objectives  and  the 
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payoff  of  one  player  is  one  minus  the  payoff  of  the  other,  or  nonzero-sum, 
where  each  player  has  a  prescribed  payoff  function  based  on  the  outcome 
of  the  game.  The  fundamental  question  for  games  is  the  existence  of  equi¬ 
librium  values.  For  zero-sum  games,  this  involves  showing  a  determinacy 
theorem  that  states  that  the  expected  optimum  value  obtained  by  player  1 
is  exactly  one  minus  the  expected  optimum  value  obtained  by  player  2.  For 
one-step  zero-sum  games,  this  is  von  Neumann’s  minrnax  theorem  [21].  For 
infinite  games,  the  existence  of  such  equilibria  is  not  obvious,  in  fact,  by 
using  the  axiom  of  choice,  one  can  construct  games  for  which  determinacy 
does  not  hold.  However,  a  remarkable  result  by  Martin  [13]  shows  that  all 
stochastic  zero-sum  games  with  Borel  payoffs  are  determined.  For  nonzero- 
sum  games,  the  fundamental  equilibrium  concept  is  a  Nash  equilibrium  [10], 
that  is,  a  strategy  profile  such  that  no  player  can  gain  by  deviating  from 
the  profile,  assuming  the  other  player  continue  playing  the  strategy  in  the 
profile. 

Qualitative  and  quantitative  analysis.  The  analysis  of  concurrent  zero- 
sum  games  can  be  broadly  classified  into 

•  quantitative  analysis  that  involves  analysis  of  the  optimum  values  of 
the  games,  and  e-optimal  strategies  that  ensure  values  within  e  of  the 
optimum  value;  and 

•  qualitative  analysis  that  involves  the  simpler  analysis  of  the  set  of 
states  where  the  optimum  value  is  1,  and  e-limit-sure  winning  strate¬ 
gies  that  ensure  satisfying  the  objective  with  value  at  least  1  —  e. 

In  general  qualitative  analysis  of  concurrent  games  is  simpler  as  compared 
to  quantitative  analysis,  as  it  only  considers  the  case  when  the  value  is  1. 
Optimum  values  in  concurrent  games  can  be  irrational  even  for  reachability 
and  safety  objectives  (with  all  rational  transition  probabilities)  and  hence 
quantitative  analysis  requires  more  involved  analysis. 

Properties  of  concurrent  games.  The  result  of  Martin  [13]  establishes 
the  determinacy  of  zero-sum  concurrent  games  for  all  Borel  objectives.  The 
determinacy  result  sets  forth  the  problem  of  study  and  closer  understanding 
of  properties  and  behaviors  of  concurrent  games  with  different  class  of  ob¬ 
jectives.  Several  interesting  questions  related  to  concurrent  games  are:  (1) 
relationship  of  qualitative  and  quantitative  analysis;  (2)  characterizing  e- 
optimal  strategies  and  e-limit-sure  winning  strategies  and  their  relationship; 
(3)  relationship  of  zero-sum  and  nonzero-sum  games.  The  results  of  [6,  7,  1] 
exhibited  several  interesting  properties  for  concurrent  games  with  ^-regular 
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objectives  specified  as  parity  objectives.  The  result  of  [6]  showed  the  posi¬ 
tive  limit-one  property,  that  states  if  there  is  a  state  with  positive  optimum 
value,  then  there  is  a  state  with  optimum  value  1,  for  concurrent  games  with 
parity  objectives.  The  positive  limit-one  property  and  establishing  the  rela¬ 
tion  of  qualitative  and  quantitative  analysis  were  key  to  develop  algorithms 
and  improved  complexity  bound  for  quantitative  analysis  concurrent  games 
with  parity  objectives  [1].  The  above  properties  can  often  be  the  basic  in¬ 
gredients  for  the  computational  complexity  analysis  of  quantitative  analysis 
of  concurrent  games. 

Outline  of  results.  In  this  work,  we  consider  tail  objectives ,  the  objec¬ 
tives  that  do  not  depend  on  any  finite-prefix  of  the  traces.  Tail  objectives 
subsume  canonical  cu-regular  objectives  such  as  parity  objectives  and  Muller 
objectives,  and  we  show  that  there  exist  tail  objectives  that  cannot  be  ex¬ 
pressed  as  cu-regular  objectives.  Hence  tail  objectives  are  a  strictly  richer 
class  of  objectives  than  cu-regular  objectives.  Our  result  characterizes  sev¬ 
eral  properties  of  concurrent  games  with  tail  objectives.  The  results  are  as 
follows: 

1.  We  show  the  positive  limit-one  property  for  concurrent  games  with 
tail-objectives.  Our  result  thus  extend  the  result  of  [6]  from  parity 
objectives  objectives  to  a  richer  class  of  objective  that  lie  in  the  higher 
levels  of  Borel  hierarchy.  The  result  of  [6]  follows  from  a  complementa¬ 
tion  argument  of  quantitative  /i-calculus  formula.  Our  proof  technique 
is  completely  different:  it  uses  a  novel  strategy  construction  procedure 
and  a  convergence  result  from  martingale  theory.  It  may  be  noted 
that  the  positive  limit-one  property  for  concurrent  games  with  Muller 
objectives  follows  from  the  positive  limit-one  property  for  parity  objec¬ 
tives  and  the  reduction  of  Muller  objectives  to  parity  objectives  [20]. 
Since  Muller  objectives  are  tail  objectives,  our  result  presents  a  direct 
proof  for  the  positive  limit-one  property  for  concurrent  games  with 
Muller  objectives. 

2.  We  establish  connection  between  the  complexity  of  strategies  for  quan¬ 
titative  winning  (e-optimality)  and  qualitative  winning  (e-limit-sure 
winning)  for  tail  objectives.  We  show  that  witnesses  for  strategies 
for  quantitative  winning  can  be  constructed  by  composing  witnesses 
of  strategies  that  are  qualitative  winning  in  sub-games,  and  respect 
certain  local  conditions. 

3.  We  relate  the  optimum  values  of  zero-sum  games  with  tail-objectives 
with  Nash-equilibrium  values  of  non-zero  sum  games  with  reachabil- 
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ity  objectives.  This  establishes  a  relationship  between  the  values  of 
concurrent  games  with  complicated  tail  objectives  and  Nash  equilib¬ 
rium  of  nonzero-sum  games  with  simpler  objectives.  Our  result  also 
presents  a  polytime  reduction  of  quantitative  analysis  of  tail  objectives 
to  qualitative  analysis  for  the  special  case  of  MDPs.  The  above  result 
was  previously  known  for  the  sub-class  of  w-regular  objectives  [4,  5,  2]. 
The  proof  techniques  of  [4,  5,  2]  uses  different  analysis  of  the  structure 
of  MDPs  and  is  completely  different  from  our  proof  techniques. 

The  properties  we  prove  makes  it  likely  that  qualitative  analysis  for  concur¬ 
rent  games  with  tail  objectives  can  be  extended  to  quantitative  analysis.  The 
complexity  for  qualitative  analysis  of  concurrent  games  and  its  sub-classes 
with  tail  objectives  is  an  open  problem. 

2  Definitions 

Notation.  For  a  countable  set  A,  a  probability  distribution  on  A  is  a  func¬ 
tion  6:  A  [0, 1]  such  that  *^(a)  =  1-  We  denote  the  set  of  probability 

distributions  on  A  by  T>(A).  Given  a  distribution  5  G  T>(A),  we  denote  by 
Supp((5)  =  {x  G  A  |  8(x)  >  0}  the  support  of  5. 

Definition  1  (Concurrent  Games)  A  (two-player)  concurrent  game 
structure  G  =  (S,  Moves ,  Mv  \ ,  Mv 2,  5)  consists  of  the  following  components: 

•  A  finite  state  space  S. 

•  A  finite  set  Moves  of  moves. 

•  Two  move  assignments  Mv\,Mv%  '■  S  e- >  2Moves  \  0.  For  i  G  {1,2}, 
assignment  Mvi  associates  with  each  state  s  G  S  the  non-empty  set 
Mvi(s)  C  Moves  of  moves  available  to  player  i  at  state  s. 

•  A  probabilistic  transition  function  5  :  S  x  Moves  x  Moves  — ►  T>(S), 
that  gives  the  probability  S(s,  ai,a2)(t)  of  a  transition  from  s  to  t  when 
player  1  plays  a\  and  player  2  plays  move  a^,  for  all  s,t  G  S  and 
a\  G  Mv\(s),  a2  G  Mv2(s).  I 

An  important  special  class  of  concurrent  games  are  Markov  decision  pro¬ 
cesses  (MDPs).  In  MDPs  at  every  state  s,  \Mv2(s)\  =  1,  i.e. ,  the  set  of 
available  moves  for  player  2  is  singleton  at  every  state. 

At  every  state  s  G  S,  player  1  chooses  a  move  ai  G  Mvi(s),  and  simul¬ 
taneously  and  independently  player  2  chooses  a  move  02  G  Mv2(s).  The 
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game  then  proceeds  to  the  successor  state  t  with  probability  5(s,ai,a2){t), 
for  all  t  £  S.  A  state  s  is  called  an  absorbing  state  if  for  all  a\  £  Mv\{s) 
and  02  £  Mv2{s )  we  have  d(s,  a\,  a2)(s)  =  1.  In  other  words,  at  s  for  all 
choice  of  moves  of  the  players  the  next  state  is  always  s.  We  assume  that 
the  players  act  non- cooperatively,  i.e. ,  each  player  chooses  her  strategy  in¬ 
dependently  and  secretly  from  the  other  player,  and  is  only  interested  in 
maximizing  her  own  reward.  For  all  states  s  £  S  and  moves  ai  £  Mv\{s) 
and  02  £  Mv2(s),  we  indicate  by  Dest(s,  a\,  02)  =  Supp(d(s,  a\,  02))  the  set 
of  possible  successors  of  s  when  moves  a\,  02  are  selected. 

A  path  or  a  play  w  of  G  is  an  infinite  sequence  u  =  (so,si,-S2,  •  •  •)  of 
states  in  S  such  that  for  all  k  >  0,  there  are  moves  a\  £  Mv\{sk)  and 
a§  £  Mv2{sk )  with  S(sk,  a\,  a|)(sfc+i)  >  0.  We  denote  by  II  the  set  of  all 
paths  and  by  the  set  of  all  paths  u>  =  (sq,  s  1,  S2,  ■  ■  ■)  such  that  so  =  s, 
i.e.,  the  set  of  plays  starting  from  state  s. 

Randomized  strategies.  A  selector  £  for  player  i  £  {  1,2  }  is  a  function 
t;  :  S  V(Moves)  such  that  for  all  s  £  S  and  a  £  Moves,  if  £(s)(a)  >  0,  then 
a  £  Mvi(s).  We  denote  by  A*  the  set  of  all  selectors  for  player  i  £  {  1,2}.  A 
selector  £  is  pure  if  for  every  s  £  S  there  is  a  £  Moves  such  that  £(s)(a)  =  1; 
we  denote  by  Af  C  A*  the  set  of  pure  selectors  for  player  i.  A  strategy  for 
player  1  is  a  function  r  :  S+  — >  Ai  associates  with  every  finite  non-empty 
sequence  of  states,  representing  the  history  of  the  play  so  far,  a  selector. 
Similarly  we  define  strategies  n  for  player  2.  A  strategy  r  for  player  i  is  pure 
if  it  yields  only  pure  selectors,  that  is,  is  of  type  S+  — >  Af.  A  memoryless 
strategy  is  independent  of  the  history  of  the  play  and  depends  only  on  the 
current  state.  Memoryless  strategies  coincide  with  selectors,  and  we  often 
write  r  for  the  selector  corresponding  to  a  nrenroryless  strategy  r.  A  strategy 
is  pure  nrenroryless  if  it  is  pure  and  nrenroryless.  We  denote  by  VM ,  F PM 
the  family  of  pure,  nrenroryless  and  pure  nrenroryless  strategies  for  player  1 
respectively.  Analogously  we  define  the  families  of  strategies  for  player  2. 
We  denote  by  F  and  II  the  set  of  all  strategies  for  player  1  and  player  2, 
respectively. 

Once  the  starting  state  s  and  the  strategies  r  and  7r  for  the  two  players 
have  been  chosen,  the  game  is  reduced  to  an  ordinary  stochastic  process. 
Hence,  the  probabilities  of  events  are  uniquely  defined,  where  an  event  A  C 
fls  is  a  measurable  set  of  paths.  For  an  event  A  C  Hs,  we  denote  by  PrJ,7r(„4) 
the  probability  that  a  path  belongs  to  A  when  the  game  starts  from  s  and 
the  players  follows  the  strategies  r  and  7 r.  For  i  >  0,  we  also  denote  by 
0j  :  — >  S  the  random  variable  denoting  the  i-th  state  along  a  path. 


6 


Objectives.  We  specify  objectives  for  the  players  by  providing  the  set  of 
winning  plays  $  C  17  for  each  player.  Given  an  objective  $  we  denote  by  $  = 
17  \  <h,  the  complementary  objective  of  $.  A  concurrent  game  with  objective 
$1  for  player  1  and  $2  for  player  2  is  zero-sum  if  $2  =  $1  [17,  9].  A  general 
class  of  objectives  are  the  Borel  objectives  [11].  A  Borel  objective  $  C  is 
a  Borel  set  in  the  Cantor  topology  on  Su>.  In  this  paper  we  consider  co-regular 
objectives  [20],  which  lie  in  the  first  21/2  levels  of  the  Borel  hierarchy  (i.e.,  in 
the  intersection  of  S3  and  II3)  and  tail  objectives  which  is  a  strict  superset 
of  w-regular  objectives.  The  ^-regular  objectives,  and  subclasses  thereof, 
and  tail  objectives  are  defined  below.  For  a  play  u  =  (sq,  si,  S2,  •  •  •)  G  17,  we 
define  Inf  (a;)  =  {  s  G  S  \  Sk  =  s  for  infinitely  many  k  >  0  }  to  be  the  set  of 
states  that  occur  infinitely  often  in  to. 

•  Reachability  and  safety  objectives.  Given  a  set  T  C  S  of  “tar¬ 
get”  states,  the  reachability  objective  requires  that  some  state  of  T 
be  visited.  The  set  of  winning  plays  is  thus  Reach(T)  =  {  co  = 
{so,  si,  S2,  ■  ■  •}  G  17  |  Sk  G  T  for  some  k  >  0  }.  Given  a  set  F  C  S, 
the  safety  objective  requires  that  only  states  of  F  be  visited.  Thus, 
the  set  of  winning  plays  is  Safe(F)  =  {  u>  =  (so,  si,  «2,  •  •  •)  G  17  |  Sk  G 
F  for  all  k  >  0  } . 

•  Biichi  and  coBiichi  objectives.  Given  a  set  B  C  5  of  “Biichi”  states,  the 
Biichi  objective  requires  that  B  is  visited  infinitely  often.  Formally,  the 
set  of  winning  plays  is  Biichi(B)  =  {  co  G  17  |  Inf  (a;)  n  B  /  0  }.  Given 
C  C  S,  the  coBiichi  objective  requires  that  all  states  visited  infinitely 
often  are  in  C.  Formally,  the  set  of  winning  plays  is  coBiichi(C')  = 
{  co  G  17  |  Inf  (a;)  C  C}. 

•  Parity  objective.  For  c,d  G  N,  we  let  [c..d\  =  {  c,  c  +  1, . . . ,  d  }.  Let 
p  :  S  e- >  [0..d]  be  a  function  that  assigns  a  priority  p(s)  to  every  state 
sGS,  where  c!gN.  The  Even  parity  objective  is  defined  as  Parity  (p)  = 
{  lo  G  17  |  min  (p(Inf(cu)))  is  even  },  and  the  Odd  parity  objective  as 
coParity(p)  =  {  oj  G  17  |  min  (p(Inf(a;)))  is  odd  }.  Informally  we  say 
that  a  path  oj  satisfies  the  parity  objective,  Parity (p),  if  uj  G  Parity (p). 

•  Muller  objective.  Given  a  set  At  C  2s ,  the  Muller  objective  is  defined 
as  Muller  (At)  =  {  u  G  17  |  Inf  (a;)  G  At  }. 

•  Tail  objective.  Informally  the  class  of  tail  objectives  are  the  class  of 
objectives  that  are  independent  of  all  finite  prefixes.  An  objective 

is  a  tail  objective,  if  the  following  condition  hold:  a  path  uj  G  $  if 
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and  only  if  for  all  i  >  0,  Wj  €  4>,  where  LOi  denotes  the  path  to  with 
the  prefix  of  length  i  deleted.  Formally,  let  Gi  =  cr(0j,  @i+i, . . .)  be 
the  e-field  generated  by  the  random- variables  0j,0j+i, ....  The  tail 
(j-field  T  is  defined  as  T  =  f]i>0Gi-  An  objective  $  is  a  tail  objective 
if  and  only  if  $  belongs  to  the  tail  a- field  T,  i.e. ,  the  tail  objectives 
are  indicator  functions  of  events  A  &  T. 

The  Miiller  and  parity  objectives  are  canonical  forms  to  represent  id- 
regular  objectives  [14,  19].  Observe  that  Muller  and  parity  objectives  are 
tail  objectives.  Note  that  for  a  priority  function  p  :  V  — >  {  0, 1  },  an  even 
parity  objective  Parity(p)  is  equivalent  to  the  Biichi  objective  Biichi(p^1(0)), 
i.e.,  the  Biichi  set  consists  of  the  states  with  priority  0.  Biichi  and  coBiichi 
objectives  are  special  cases  of  parity  objectives  and  hence  tail  objectives. 
Reachability  objectives  are  not  necessarily  tail  objectives,  but  for  a  set  T  C  S 
of  states,  if  every  state  s  G  T  is  an  absorbing  state,  then  the  objective 
Reach(T)  equivalent  to  Biichi(T)  and  hence  is  a  tail  objective.  It  may  be 
noted  that  since  cr-fields  are  closed  under  complementation  the  class  of  tail 
objectives  are  closed  under  complementation.  We  give  an  example  to  show 
that  the  class  of  tail  objectives  are  richer  than  cu-regular  objectives. 

Example  1  Let  r  be  a  reward  function  that  maps  every  state  s  to  a  real¬ 
valued  reward  r(s),  i.e.,  r  :  S  —>  R.  For  a  constant  c  G  N  consider  the 
objective  <hc  =  liimsUpc  defined  as  follows: 

1  n 

$c  =  {weJl|w  =  (si,S2,  S3, ... .),  lim  sup  —  >  risi)  >  c  }. 

n— >oo  Tl 

Intuitively,  4>c  accepts  the  set  of  paths  such  that  the  ‘long-run”  average  of 
the  rewards  in  the  path  is  at  least  the  constant  c.  The  lim  sup  condition  lie 
in  the  third-level  of  the  Borel- hierarchy  (i.e.,  in  II3  and  II3- complete)  and 
cannot  be  expressed  as  an  w-regular  objective.  Hence  the  class  U^nuliimsnp 
of  objectives  cannot  be  expressed  as  u-regular  objectives.  It  may  be  noted 
that  the  “long-run”  average  of  a  path  is  independent  of  all  finite-prefixes  of 
the  path.  Formally,  the  class  UcepjliimsuPc  of  objectives  are  tail  objectives. 
Since  limsupc  are  H^-complete  objectives,  it  follows  that  tail  objectives  lie 
in  higher  levels  of  Borel  hierarchy  than  uj-regular  objectives.  I 

Values.  The  probability  that  a  path  satisfies  an  objective  $  starting  from 
state  s  6  S,  given  strategies  r,  ir  for  the  players  is  Pr^,7r(4>).  Given  a  state 
s  £  S  and  an  objective,  $,  we  are  interested  in  the  maximal  probability 


with  which  player  1  can  ensure  that  T  and  player  2  can  ensure  that  $  holds 
from  s.  We  call  such  probability  the  value  of  the  game  G  at  s  for  player 
i  e  {1,2}.  The  value  for  player  1  and  player  2  are  given  by  the  function 
((1  ))vai($)  :  51  n  [0, 1]  and  ((2))„a/(<f>)  :  S  [0, 1],  defined  for  all  s  <E  S  by 

((l»m/(^)(s)  =  sup  inf  PrJ’7r($) 
rer 

«2))wrf(l)(S)  =  sup  infPr^(¥). 

Tren^er 

Note  that  the  objectives  of  the  player  are  complementary  and  hence  we 
have  a  zero-sum  game.  Concurrent  games  satisfy  a  quantitative  version  of 
determinacy  [13],  stating  that  for  all  Borel-objectives  $,  and  all  s  G  S,  we 
have 

((i))vaims)+((2))vaimS)  =  i. 

A  strategy  r  for  player  1  is  optimal  for  objective  $  if  for  all  s  £  S  we  have 

inf  Pr^($)  =  «1  ))vaims). 

7TGI1 

For  e  >  0,  a  strategy  r  for  player  1  is  e-optimal  for  objective  if  for  all 
sGSwe  have 

inf  PrI>’r($)>«l))W0,($)(a)-£. 

7 rElI 

We  define  optimal  and  e-optimal  strategies  for  player  2  symmetrically.  For 
e  >  0,  an  objective  $  for  player  1  and  $  for  player  2,  we  denote  by  r£(<h)  and 
ne(<h)  the  set  of  e-optimal  strategies  for  player  1  and  player  2,  respectively. 
Note  that  the  quantitative  determinacy  of  concurrent  games  is  equivalent  to 
the  existence  of  e-optimal  strategies  for  objective  $  for  player  1  and  $  for 
player  2,  for  all  e  >  0,  at  all  states  s  G  S,  i.e. ,  for  all  e  >  0,  re(<f>)  /  0  and 

ne(¥)  +  0. 

We  refer  to  the  analysis  of  limit-sure  winning  states  (the  set  of  states  s 
such  that  ((l))„a;(<h)(s)  =  1)  and  e-limit-sure  winning  strategies  (e-optimal 
strategies  for  the  limit-sure  winning  states)  as  the  qualitative  analysis  of  ob¬ 
jective  $.  We  refer  to  the  analysis  of  the  values  and  the  e-optimal  strategies 
as  the  quantitative  analysis  of  objective  <h. 

3  Positive  Limit-one  Property 

The  positive  limit-one  property  for  concurrent  games  for  a  class  C  of  objec¬ 
tives  states  that  for  all  objectives  $  £  C,  for  all  concurrent  games  G,  if  there 
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is  a  state  s  such  that  the  value  for  player  1  is  positive  at  s  for  objective 
<h,  then  there  is  a  state  s'  where  the  value  for  player  1  is  1  for  objective 
<h.  The  property  means  if  a  player  can  win  with  positive  value  from  some 
state,  then  from  some  state  she  can  win  with  value  1.  The  positive  limit-one 
property  was  proved  for  parity  objectives  in  [6]  and  has  been  one  of  the  key 
properties  used  in  the  algorithmic  analysis  of  concurrent  games  with  par¬ 
ity  objectives  [1].  In  this  section  we  prove  the  positive  limit-one  property 
for  concurrent  games  with  tail-objectives,  and  thereby  extend  the  positive 
limit-one  property  from  parity  objectives  to  a  richer  class  of  objectives  that 
subsume  several  canonical  cu-regular  objectives.  Our  proof  uses  a  result  from 
martingale  theory  and  a  novel  strategy  construction,  whereas  the  proof  for 
the  sub-class  of  parity  objectives  [6]  followed  from  complementation  argu¬ 
ments  of  quantitative  //-calculus  formula. 

Notation.  In  the  setting  of  concurrent  games  the  natural  filtration  sequence 
(Fn)  for  the  stochastic  process  under  any  pair  of  strategies  is  defined  as 

F re  =  Cr(@i,  ©2,  •  •  •  ,  ©n) 

i.e. ,  the  cr-field  generated  by  the  random-variables  ©i,  ©2, . . . ,  0n. 

Lemma  1  (Levy’s  0-1  law)  Suppose  Tin  |  T~lco,  i.e.,Tin  is  a  sequence  of 
increasing  a -fields  and  Ti^  =  a(UnTin).  For  all  events  A  £  Ti0 0  we  have 

E(l_4  I  Tin)  =  Pr(1.4  |  Tin)  —■ >  1a  almost- surely,  (i.e.,  with  probability  1), 

where  I.4  is  the  indicator  function  of  event  A. 

The  proof  of  the  lemma  is  available  in  Durrett  (page  262 — 263)  [8].  An 
immediate  consequence  of  Lemma  1  in  the  setting  of  concurrent  games  is 
the  following  Lemma. 

Lemma  2  (0-1  law  in  concurrent  games)  For  all  events  A  £  F^  = 
cr( UnFn),  for  all  strategies  (r,  tt)  £  T  x  II,  for  all  states  s,  we  have 

PrJ,7r(l_4  |  Fn)  — ►  1_a  almost- surely, 

where  I.4  is  the  indicator  function  of  event  A. 

Intuitively,  the  lemma  means  that  the  probability  PrJ,7r(l_4  |  Fn )  converges 
almost-surely  (i.e.,  with  probability  1)  to  0  or  1  (since  indicator  functions 
take  values  in  the  range  {  0, 1  }).  Note  that  the  tail  cr-field  T  is  a  subset  of 
Foo,  i.e.,  T  C  Foo,  and  hence  the  result  of  Lemma  2  holds  for  all  A  £  T . 
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Theorem  1  (Positive  limit-one  property)  For  all  concurrent  games, 
for  all  tail  objectives  <5,  if  there  exists  a  state  s  such  that  ((1  ))vai(&)(s)  >  0, 
then  there  exists  a  state  s'  such  that  ((1 )) vai(<&){s')  =  1. 

Proof.  Assume  towards  contradiction  that  there  exists  a  state  s  such 
that  ((l))m/($)(s)  >  0,  but  for  all  states  s'  we  have  ((l))va;($)(s/)  <  1. 
Since  ((2 ))vai($)(s')  =  1  -  ((1  ))vai{$)(s'),  we  have  {{2))  vtd($)(s')  >  0, 
for  all  states  s'.  Fix  //  such  that  0  <  g  =  minses  {{2))vai($)(s).  Let 
0  <  2e  <  min{  rj,  ((1  )}vai(Q)(s)  }.  Fix  an  e-optinral  strategy  re  for  player  1. 
We  define  a  sequence  of  strategies  7r*  for  player  2:  let  ttq  be  an  e-optinral 
strategy  for  player  2.  The  strategy  7Tj+i  is  defined  as  follows.  For  a  history 
(si, s2i  ■■■  ,Sk)  we  have 


7Ti+l((®l>  S2,  •  •  •  j  Sfc)) 


TTi({si,S2,  •  •  •  ,  Sfc))  if  PrJ5’7^  I  (si,  S2,  •  •  •  ,  Sk))  >  £ 
ni((s1,s2,  Sk))  if  Pr^l7ri($  |  (si,  s2, ...,  sk))  <  e 

where  n \  is  an  e-optinral  strategy 
from  state  sk 


Intuitively,  the  strategy  ^ rj+j  is  as  follows:  for  a  history  (s\,s2, . . .  ,sk)  if 
the  strategy  7 t*  ensures  value  greater  than  e,  then  7Tj+i  follows  strategy 
7 r,;  else  it  switches  to  an  e-optinral  strategy  tt,  from  state  sk-  Since  77*  is 
an  e-optinral  strategy  from  state  sk,  ((2})vai($>)(s)  >  2e  for  all  states  s, 
and  $  is  a  tail  objective  that  is  independent  of  all  finite-prefixes  we  have 
PrI£,,Ti+1(¥)  >  Pr^(l).  Let 


Prl 


=  linr  Pr (<E>) ; 


(the  limit  exists  since  it  is  a  non-decreasing  sequence  of  values  bounded 
by  1)  where  7 Too  =  linr^ooTr*.  Since  $  is  a  tail  objective  (i.e.,  $  € 
limn_+00  cr(©n,@n+i, . . . , )),  it  follows  that  PrJe,7roo(<L)  >  PrJ£,5Ti(<l>),  for  all 
i  >  0.  Again  by  the  construction  of  the  sequence  of  strategies  we  have 


PrIei7roo($  |  (si,S2,  s3,.  ..,sn))  >  e, 

for  all  histories  (si,  s2,  S3, . . . ,  sn).  It  follows  from  Lenrnra  2  that  Prj?,7roo(<I>  | 
Fn)  — >•  {  0, 1  }  almost-surely.  Hence  we  conclude  that  PrJ£,7r°°(d>  |  Fn)  — >  1 
alnrost-surely,  i.e.,  PrJ£,7r°°(d>  |  Fn)  — >  0  almost-surely.  Since  re  is  an  e- 
optinral  strategy,  we  get  that  ((1))WK^)(S)  <  £-  But  by  assumption,  2e  < 
((l)}«a/(<h)('S)  and  hence  we  have  a  contradiction.  Thus  the  desired  result  is 
established.  I 
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4  Strategy  Characterization  for  Tail  Objectives 


In  this  section  we  show  that  in  concurrent  games  with  tail  objectives,  wit¬ 
nesses  for  e-optimal  strategies  can  be  constructed  using  witnesses  for  e-limit- 
sure  winning  strategies  of  sub-games,  that  respect  certain  local  optimality 
conditions.  The  result  characterizes  the  strategy  complexity  for  quantita¬ 
tive  optimality  for  tail  objectives  in  terms  of  qualitative  optimality  and  local 
optimality. 

We  relate  the  values  of  zero-sum  games  with  tail-objectives  with  the 
Nash  equilibrium  values  of  nonzero-sum  games  with  reachability  objectives. 
The  result  shows  that  the  values  of  a  zero-sum  game  with  complicated  ob¬ 
jectives  can  be  related  to  equilibrium  values  of  a  nonzero-sum  game  with 
simpler  objectives.  We  also  show  that  for  MDPs  the  value  function  for  a 
tail  objective  $  can  be  computed  by  computing  the  maximal  probability  of 
reaching  the  set  of  states  with  value  1.  As  an  immediate  consequence  of  the 
above  analysis,  we  obtain  a  polytime  reduction  of  the  quantitative  analysis 
of  MDPs  with  tail  objectives,  to  the  qualitative  analysis. 

Local  optimality.  A  key  notion  that  will  play  an  important  role  in  the 
construction  of  e-optimal  strategies  is  the  notion  of  local  optimality.  Infor¬ 
mally,  a  selector  function  £  is  locally  optimal  if  it  is  optimal  in  the  one-step 
matrix  game  where  each  state  is  assigned  a  reward  value  A  lo¬ 

cally  optimal  strategy  is  a  strategy  that  consists  of  locally  optimal  selectors. 
A  locally  e-optimal  strategy  is  a  strategy  that  has  a  total  deviation  from 
locally-optimal  selectors  of  at  most  e.  We  note  that  local  e- optimality  and 
e-optimality  are  very  different  notions.  Local  e-optimality  consists  in  the 
approximation  of  a  local  selector;  a  locally  e-optimal  strategy  provides  no 
guarantee  of  yielding  a  probability  of  winning  the  game  close  to  the  optimal 
one. 

Definition  2  (Locally  e-optimal  selectors  and  strategies)  A  selector 
f  is  locally  optimal  for  objective  if  for  all  s  £  S  and  02  £  Mvs(s)  we  have 

E  [  (( 1 »  val  (^)  (0 1 )  |  S,Z(s),a2]  >  ((l))vaims). 

We  denote  by  A^(<h)  the  set  of  locally-optimal  selectors  for  objective  <h.  A 
strategy  r  is  locally  optimal  for  objective  $  if  for  every  history  (sq,  «i,  •  •  • ,  s&) 
we  have  t((sq,  s±,  . . . ,  s*,})  £  A^(T),  i.e.,  player  1  plays  a  locally  optimal 
selector  at  every  stage  of  the  play.  We  denote  by  T^(3>)  the  set  of  locally 
optimal  strategies  for  objective  <h.  A  strategy  t£  is  locally  e-optimal  for 
objective  $  if  for  every  strategy  it  £  II,  for  all  k  >  1,  for  all  states  s  we  have 

«!»„,(*)(*)-  Ejq<<i))™K<&)(efe)]<e. 
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Observe  that  a  strategy  that  at  each  round  i  chooses  a  locally  optimal  selector 
with  probability  at  least  (1  —  £«),  with  <  £,  is  a  locally  e-optimal  strat¬ 

egy.  We  denote  by  r~(<I>)  the  set  of  locally  e-optimal  strategies  for  objective 

<f>.  I 


We  first  show  that  for  all  tail  objectives,  for  all  e  >  0,  there  exist  strategies 
that  are  e-optimal  and  e-locally  optimal  as  well. 


Lemma  3  For  all  tail  objectives  for  all  e  >  0, 

1.  rf($)crf(<E>). 

2.  re($)nr£($)  +  0. 


Proof.  For  e  >  0,  fix  an  |-optimal  strategy  r  for  player  1.  By  definition  r 
is  an  e-optimal  strategy  as  well.  We  argue  that  r  G  rf(3>).  Assume  towards 
contradiction  that  r  0  i.e. ,  there  exists  a  player  2  strategy  7 r,  a  state 

s,  and  k  such  that 


((i))valms) -E^[((i))valm&k)}  >  e. 


Fix  a  strategy  it*  =  (n  +  if)  for  player  2  as  follows:  play  7r  for  k  steps,  then 
switch  to  an  |-optimal  strategy  if.  Formally  for  a  history  (si,  s2l  ■  ■  ■ ,  sn)  we 
have 


'e  ((si,  §2,  •  •  •  ,  ^n)) 


7r((si,s2,...,sn))  if  n  <  k 

<  7f((sfe+i,  Sfc+2,  •  •  • ,  sn ))  if  n  >  k 

where  if  is  an  |-optimal  strategy. 


Since  $  is  a  tail  objective  we  have 

Prl’7r*($)  <  Ep,T[((l)),;aZ($)(0A.)]  +  |  (since  if  is  an  ^-optimal  strategy). 


Hence  we  have 

PrrA)  <  («l»^(*)W-0  + J  =  «l»™i(*)W-f  < 


Since  by  assumption  r  is  an  |-optimal  strategy  we  have  a  contradiction. 
This  establishes  the  desired  result.  I 

A  value  class  of  the  game  is  the  set  of  all  states  where  the  game  has  a  given 
value. 


Definition  3  (Value  class)  A  value  class  VC (r)  is  the  set  of  states  s  such 
that  the  value  for  player  1  is  r.  Formally,  VC (r)  =  {s  |  ((l))m;(<h)(s)  =  r}. 
Note  that  for  any  game  there  are  at  most  |5|  many  value  classes.  By  VC<r 
we  denote  the  set  {s  |  ((l))„az(<h)(s)  <  r}  and  similarly  we  use  VC>r  to 
denote  the  set  {s  |  ((l))m;(<h)(s)  >  r}.  I 


13 


Reduction.  We  present  a  reduction  from  every  value  class  VC  (r)  to  a 
concurrent  game  Gr  and  then  establish  a  few  key  properties  of  the  game  Gr. 
Let  G  =  ( S ,  Moves,  Mv i,  Mv 2, 5)  be  a  concurrent  game  with  a  tail  objective 
$  for  player  1.  For  a  state  s  we  define  the  set  of  allowable  actions  as  follows 

OptSupp(s)  =  {7C  Mvi(s)  |  such  that  there  is  an  optimal  selector 

£  A£(<h)  and  Supp(£f)  =  7}. 

Consider  a  value  class  VC(r)  with  0  <  r  <  1.  We  construct  a  concurrent 
game  Gr  =  ( Sr ,  Moves,  Mv  1,  Mv 2,  5)  as  follows: 

1.  State  space.  Given  a  state  s  let  OptSupp(s)  =  {71, 72,  •  •  • ,  7 k}-  Then 
we  have 

Sr  =  {  S'  |  s  £  VC (r)  }  U  {  w\,w2  }• 

2.  Moves  assignment.  Mv±(s)  =  {1,2, . . .  ,k}  such  that  OptSupp(s)  = 
{7i,72,...,7fc}  and  Mv2(s)  =  Mv2(s). 

3.  Transition  function. 

(a)  The  states  w\  and  w2  are  absorbing  states  such  that  player  1  have 
value  1  and  0  at  state  w\  and  w2,  respectively. 

(b)  Transition  function  at  state  s. 

i.  For  any  move  a2  £  Mv2{s),  if  there  is  a  move  ai  £  7 *  such 
that  Y^s'gv C(r)  (^('s)ai)a2)('S/)  >  0,  then  5(s,  i,  a2)(wi)  =  1. 
The  above  transition  specifies  that  if  for  a  move  a2  for 
player  2  and  a  move  ai  £  7*  for  player  1,  if  the  game  G 
proceeds  to  a  different  value  class  with  positive  probability 
then  in  Gr  the  game  proceeds  to  the  state  w\,  which  has 
value  1  for  player  1,  with  probability  1.  Note,  that  since 
ai  £  7 i  and  7 ,  £  OptSupp(s),  if  in  G  the  game  proceeds  to  a 
different  value  class  with  positive  probability  it  also  proceeds 
to  VC>r  with  positive  probability. 

ii.  For  any  move  a2  £  Mv2{s),  if  for  every  move  a\  £  7 ,  we  have 
Es'eVC(r)  <S(s,ai,a2)(s')  =  1,  then 

5(s,i,a2)(s')  =  ^  £1(01)  •  fi(s,ai,a2)(s') 
where  is  an  locally  optimal  selector  with  Supp(^f)  =  7 j. 
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iii.  For  any  move  a\  G  (Mvi(s)  \  7 ,),  for  any  move  a2  G  Mv2(s) 
we  have: 

5(s,ai,a2)(s/)  =  J(s,ai,a2)(s')  for  s'  G  VC(r); 
S(s,ai,a2)(w2)  =  ^  S(s,ai,a2)(s'). 

s'^VC(r) 


Notation.  Let  t/>o  =  {sG<Sr\{w2}|  ((2))„a/(^)(s)  >  0  }  and  U\  =  {  s  G 

\  {  w2  }  |  «2))m/(¥)(s)  =  1  }. 

Strategy  maps.  We  define  two  strategy  maps;  t  :  T  — >  T  that  maps  a 
strategy  in  the  game  G  to  a  strategy  r  =  f(r)  in  the  game  Gr\  and  t  :  n  — ►  II 
that  maps  a  strategy  in  the  game  Gr  to  a  strategy  ir  =  t( ir)  in  the  game  G. 
The  strategy  maps  are  defined  as  follows: 

1.  Given  a  strategy  r  in  the  game  G  the  strategy  r  =  t(r)  in  the  game 
Gr  is  as  follows: 

•  T(s0,si,...,sk)(j)  =  E«e7j.r(s0,si,...,sfc)(o)  where 

7 j  =  argmax76optSupp(Sfe)  Eae7r(^’si>  •  •  • and  for 

all  a'  0  we  have  T(s0,(so,i0),si,(si,i1), . . .  ,sk,(sk,j))(a>)  = 

t(so,  si,  ,  Sfc)(a'). 

2.  Given  a  strategy  if  in  the  game  Gr  the  strategy  ir  =  t( if)  in  the  game 
G  is  as  follows: 


7r('-’0;  1  ■  ■  3  &k) 


7r(so,  si,  ... ,  Sfc)  if  Vi.  0  <  i  <  k.Si  G  VC(r) 

7r'  otherwise;  where  7r'  is  an  arbitrary  strategy 


Fact  1 .  It  follows  from  the  construction  of  the  game  Gr  that  for  all  strategies 
r  G  H(T),  for  all  states  s  G  Sr  \  {  w2  },  for  all  strategies  if  for  player  2  we 
have  Pr~,7r(Reach({  w2  }))  <  e;  where  r  =  f(r). 

Lemma  4  Let  re^G  rf($)nr?($),  for  0  <  2 e  <  r.  For  all  strategies  tt  G  II, 
/or  all  states  s  G  5r  we  have 

Pr~  ,7r(4>)  >  r  —  2e, 

where  t£  =  t(r£). 
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Proof.  Consider  a  strategy  7r  in  the  game  Gr  and  let  7r  =  t(vr).  Since  t£ 
is  an  e-optinral  strategy  and  ((l))m;($)(s)  =  r,  we  have  PrJ£,ir($)  >  r  —  e. 

Thus  we  have 

r  —  e  <  PrJ£,7r(<]?)  =  Pr^e,7r(<h  n  Safe(VC(r)) 

+  PrJr,7r(<f>  |  Reach  (VC >r  U  VC<r))  •  Pr  J£  ,7r  (Reach  (VC  >r  U  VC<r)) 
<  Pr^(d>  n  Safe(VC(r))  +  Pr;£>7r(Reach(VC>r  U  VC<r)) 

It  follows  from  the  construction  of  the  game  Gr  that  we  have 

Pr^£’7r(<h  n  Safe(VC(r))  =  Pi|£>5f(<h  n  Sa£e(5r)). 

Since  t£  is  a  locally  e-optinral  strategy,  from  Fact  1  and  the  construction  of 
the  game  Gr  we  have 

Pr~e,7r(Reach({u>i})  >  PrF’T(Reach({u>i,  tt>2}))— £  =  PrJ£,7r(Reach(VC>rUVC<r))— e. 

Since  w\  is  a  winning  absorbing  state  for  player  1  we  have 

Prp($)  =  Prf  ’*($  C  Safe(5r)) 

+  Pr~  ;7r(d>  |  Reach({  w\  }))  •  Pr~  ,7r(Reach({  w\  })) 

=  PrJl5f($  n  Safe(5r))  +  PrJ,5? (Reach ({  wx  })) 

>  PrJ£,7r($  n  Safe(VC(r))  +  PrJe ,7r (Reach (VC >r  U  VC<r))  -  e  >  r  -  2e. 
The  Lemma  follows.  I 


Lemma  5  If  U> 0  /  0,  then  U\  /  0. 


Proof.  The  argument  is  similar  to  the  proof  of  Theorem  1.  Assume  towards 
contradiction  that  XJ\  =  0,  U> 0  /  0,  and  let  s  G  t/>o-  Fix  0  <  3e  < 
nrin{  r,  ((2))m;($)(s)  }.  Let  tt£  be  an  e-optinral  strategy  for  player  2.  We 
construct  a  sequence  of  strategies  t1£  for  player  1  as  follows: 

1.  Let  t£  G  r*($)  PI  r£(d>),  i.e.,  t£  is  an  e-optinral  and  locally  e-optinral 
strategy  in  the  game  G  (the  fact  that  r((<I>)  n  re(<h)  7^  0  follows  from 
Lemma  3).  Then  7 ^  =  t£  =  t(r f). 


2.  The  strategy  r^+1  is  inductively  defined  as  follows: 


f*+1((s0,si,...,Sfc)) 


rf{{so,si,...,sk))  Prj’*e($  I  (so,Sl,...,Sk))  >  £ 
t'£((s0,si,.  . .  ,Sk})  Pr~£,7re($  I  (so,si,...,sk))  <  e, 

where  t'£  is  an  e-optinral  and 
locally  e-optinral  strategy 
from  sk  in  G  and  t'£  =  t(r'£). 
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Let  rf3  =  linij^oc  tI.  Since  3 e  <  r  (i.e.,  r— 2e  >  e),  it  follows  from  Lemma  4 
and  arguments  similar  to  Theorem  1,  that  for  all  histories  (so,  si,  s2, . . . ,  sn), 
we  have  Pr~  ’  £(<h  |  (sq,  si,  s2,  ■  ■  ■ ,  sn))  >  e.  It  follows  from  arguments 

^-oo  ~ 

similar  to  Theorem  1  that  we  have  Pr~  ’  £(<L  |  J-n)  — >  1  almost-surely,  i.e., 

^oo  ~  _ 

Pr~  ’  £(<b  |  J~n )  — >  0  almost-surely.  Hence  we  have  a  contradiction  that 
s  £  U>0.  I 

Lemma  6  For  every  r  >  0,  for  every  state  s  £  VC  (r),  the  state  J  is  a  limit- 
sure  winning  state  in  the  game  Gr  for  player  1,  i.e.,  from  state  s  player  1 
can  win  with  probability  arbitrarily  close  to  1. 

Proof.  To  prove  the  desired  result  we  show  that  ?7>o  =  0.  It  follows  from 
Lemma  5  that  it  suffices  to  show  that  U\  =  0.  Fix  0  <  2 e  <  r,  and  let  t£  be 
a  locally  e-optinral  and  e-optinral  strategy  in  G ,  i.e.,  t£  £  nre(4>)  (the 

fact  that  r^(§)  nr£($)  7^  0  follows  from  Lemma  3).  Assume  for  the  sake 
of  contradiction  that  U\  is  non-empty.  Let  s  £  U\  and  tt£  be  an  e-optinral 
strategy  for  player  2  from  s.  We  construct  a  strategy  t£  for  player  1  in  Gr 
and  a  strategy  n£  for  player  2  in  G  as  follows: 

1.  Strategy  t£  in  the  game  Gr  is  defined  as  t£  =  t(re). 

2.  Strategy  tt£  in  the  game  G  is  defined  as  n£  =  t( tt£). 

Since  t£  is  locally  e-optinral,  we  have  PrJ,7r£ (Reach ({  W2  }))  <  e.  Since 
((2))vai(<&)(s)  =  1,  and  tt£  is  an  e-optinral  strategy  we  have  that  PrJ£,7r£(<h  fi 
Safe(5r  \  {  w\,  W2  }))  >  1  —  e.  Hence  it  follows  that 

Pr^£,7r£(<h)  >  Pr^£,7r£(<I>  n  Safe(VC(r)))  >  1  —  e. 

Hence  we  have  Pr££,7r£(<]?)  <  e.  Since  t£  is  an  e-optinral  strategy  and  r  >  2e, 
and  ((l))„a/ (d5) (s)  =  r,  we  get  a  contradiction.  Thus  the  desired  result 
follows.  I 

Definition  4  (Value-class  qualitative  e-optimal  strategy)  For  an  ob¬ 
jective  4>,  for  e  >  0,  a  strategy  t£  is  a  value- class  qualitative  e-optimal  strat¬ 
egy  for  a  value-class  VC(r),  with  0  <  r  <  1,  if 

1.  t£  is  locally  e-optimal. 

2.  for  all  strategies  ir  £  n,  for  all  states  s,  for  all  his¬ 
tories  (so,  «i,  S2,  ■  ■  ■ ,  Sfc)  such  that  Sk  £  VC(r),  Prp£,7r(<I> 

(s0,  sl5  s2 ,  •  •  • ,  sk),  Safe(VC(r)))  >  1  -  e. 
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A  strategy  t£  is  value-class  qualitative  e- optimal  for  objective  $  if  it  is  value- 
class  qualitative  e-optimal  for  all  value  classes  VC  (r),  for  0  <  r  <  1. 
Value-class  qualitative  e-optimal  strategies  for  player  2  are  defined  similarly. 
We  denote  by  Te  (3>)  and  n^(<L)  the  set  of  value-class  qualitative  e-optimal 
strategies  for  objectives  $  and  $  for  player  1  and  player  2,  respectively.  I 

Observe  that  the  e-limit-sure  winning  strategies  in  the  game  Gr  satisfies  the 
requirement  for  value-class  qualitative  e-optimal  strategies  for  value-class 
VC(r).  The  existence  of  value-class  qualitative  e-optimal  strategies  for  tail 
objectives  follows  from  Lemma  6. 

Lemma  7  For  all  tail  objectives  for  all  e  >  0,  T?(<L)  /  0  and  n?(<L)  / 
0. 


We  denote  by  W1  =  {  s  |  «1  ))vai(®)(s)  =  1  }  and  W2  =  {  s  |  ((2 ))„aj($)(s)  = 
1 },  the  set  of  states  where  player  1  and  player  2  have  values  1,  respectively. 

Lemma  8  Let  t£  be  a  locally  e-optimal  strategy.  For  all  strategies  it 
for  player  2,  if  Prff’'*  (Reach(Wi  U  W2))  =  1,  then  Prff’'*  (Reach(W\))  > 

((1  ))val{®){s)-e. 

Proof.  The  results  follows  from  the  fact  that  the  sequence  (((l))„az(<l>)(@j))j 
is  a  sub-martingale  under  t£  and  ir.  I 

The  following  Lemma  shows  that  the  value-class  qualitative  e-optimal  strate¬ 
gies  for  different  value  classes  can  be  “stitched”  or  composed  together  to 
produce  an  e-optimal  strategy.  The  key  argument  is  as  follows:  if  a  play 
stays  in  S  \  (W\  U  W2)  then  by  properties  of  value-class  qualitative  e-optimal 
strategies  player  1  wins  with  probability  1;  else  the  play  reaches  W\  U  W2 
and  then  e-optinrality  is  guaranteed  by  local  e-optimality  and  Lemma  8. 
The  details  of  the  argument  is  similar  to  Lemma  14  in  [1] . 

Lemma  9  (Stitching  Lemma)  Let  t£  be  a  value-class  qualitative  e- 
optimal  strategy  and  t£  is  an  e-optimal  strategy  for  all  states  in  W\.  Then 
t£  is  an  e-optimal  strategy. 

Lemma  7,  Lemma  9  and  the  characterization  of  value-class  qualitative  e- 
optimal  strategies  as  e-limit-sure  winning  strategies  in  sub-games  and  locally 
e-optimal  strategies  establishes  the  following  Theorem.  The  Theorem  states 
that  witnesses  for  e-optimal  strategies  can  be  constructed  from  witnesses  of 
e-limit-sure  winning  strategies  in  sub-games  and  locally  e-optimal  strategies. 
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Theorem  2  (Limit-sure  to  e-optimality)  For  all  tail  objectives  for 
e  >  0,  let  re  be  a  strategy  such  that 

1.  t£  is  locally  e-optimal,  i.e.,  t£  G  r^(<l>); 

2.  for  all  value-classes  VC(r),  with  r  >  0,  for  all  strategies  n  in  Gr,  for 
all  states  s  G  Sr,  we  have  Pr~e,7r(<l>)  >  1  —  e,  where  t£  =  t(r£). 

Then  t£  is  e-optimal,  i.e.,  t£  G  r£(<E>). 

Zero-sum  tail  games  and  nonzero-sum  reachability  games.  Given  a 
gamegraph  G  with  a  tail  objective  $  consider  the  gamegraph  Ga  such  that 
every  state  s  G  W\  U  W2  is  transformed  to  an  absorbing  state  and  the  states 
in  W\  are  winning  for  player  1  and  the  states  in  W2  are  winning  for  player  2, 
i.e.,  ((1  ))vai($)(s)  =  1  for  s  G  W\  and  ((2 ))voz($)(s)  =  1  for  s  G  W2.  Note 
that  for  every  state  s  the  value  for  state  s  in  G  and  Ga  are  the  same.  In 
the  following  Lemma  we  show  that  there  exist  e-optimal  strategies  which  if 
the  players  follow,  then  the  play  reaches  W\  U  W2  with  probability  1.  We 
then  extend  the  result  to  relate  the  values  of  game  with  tail  objectives  to 
equilibrium  values  of  nonzero-sum  games  with  simple  reachability  objectives. 

Lemma  10  In  the  gamegraph  Ga,  let  G  1/  ($)  x  H£  ($),  for  suffi¬ 

ciently  small  e.  Then  for  all  states  s  we  have  PrJ,7r (Reach(W\  U  W2))  =  1. 

Proof.  We  first  prove  that  there  exists  constant  ? 7  >  0,  such  that  for  all 
histories  (s0,si,s2,  ■  ■  •  ,sk), 

PrJ,7r (Reach (Wi  U  W2)  |  (s0,  si,  s2,  ■  ■  ■ ,  sk))  >  V  >  0. 

For  all  histories  (so,  «i,  s2,  ■  ■  ■ ,  sk),  such  that  sk  G  VC(r)  we  must  have 
Pr^’^ (Safe( VC (r ) )  |  (s0,  Sl,  s2  . . . ,  sk))  =  0.  If  Prp7r(Safe(VC(r))  | 
(so,  si,  s2, . . . ,  sk))  >  0,  then  by  properties  of  value-class  qualitative  e- 
optimal  strategies  we  have 

PrJ’7r(<I>  |  (s0,si,s2,...,Sfc),Safe(VC(r)))  >  1  -  e, 

Pr  I,7r($  |  (s0,si,s2,...,Sfc),Safe(VC(r)))  >  1  -  e; 

which  is  a  contradiction  for  s  <  xj2.  Hence  for  all  histories  (so,  si,  s2, . . . ,  sk) 
such  that  Sfc  G  VC(r)  we  have  PrJ,7r(Safe(VC(r))  |  (so,  si,  s2,  ■  ■  ■ ,  sk))  =  0. 
Since  value-class  qualitative  e-optimal  strategies  are  e-locally  optimal,  it 
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follows  that  there  exists  constant  rf  >  0,  such  that  for  all  histories 
(so,  si,  s2,  ■  ■  ■ ,  Sk),  if  Sk  G  VC(r),  then  we  have 

PrI,7r (Reach (VC >r)  |  (s0,  si,  s2,  ■  ■  ■ ,  sk })  >  rf  >  0, 

i.e. ,  the  play  goes  to  a  greater  value  class  with  positive  probability  rf .  Since 
the  number  of  value  classes  are  finite  it  follows  that  there  exists  constant 
7]  >  0  such  that 

Prp7r(Reach(TVi  U  W2)  \  (sQ,  si,s2,  > . . ,  sk })  >  rj  >  0.  (1) 

Since  all  states  in  W\  U  W2  are  absorbing  states  it  follows  that  Reach(VCi  U 
W2)  is  a  tail  objective.  Hence  by  Lemma  2  we  have  PrJ,7r (Reach (W\  U  W2)  \ 
Fn)  — >  {0, 1}  almost-surely.  It  follows  from  (1)  that  Prg,7r(Reach(HriUHr2)  | 
Tn )  — >  1  almost-surely.  The  desired  result  follows.  I 

The  above  Lemma  states  that  if  the  players  play  value-class  qualitative  e- 
optimal  strategies,  for  sufficiently  small  s,  then  the  play  reaches  IV i  U  W2 
with  probability  1.  Since  value-class  qualitative  e-optimal  strategies  are 
e-optimal  strategies  (Lemma  9)  the  following  lemma  is  immediate. 

Lemma  11  Given  a  gamegraph  G  with  objectives  T  for  player  1  and  T  for 
player  2  we  have 

lim  sup  inf  PrJ’7r(Reac/i(ITi))  =  «l))voi($)(s); 
£^0rer^-en?(3>) 

lim  sup  inf  PrTs’^(Reach(W2))  =  ((2)}vai(^)(s). 

£^°7renrer?($) 

Consider  a  non-zero  sum  reachability  game  Gr  such  that  the  objectives 
of  both  players  are  reachability  objectives:  the  objective  for  player  1  is 
Reach(VLi)  and  the  objective  for  player  2  is  Reach (TL2).  Note  that  the 
game  Gr  is  not  zero-sum  in  the  following  sense:  there  are  infinite  paths 
uj  such  that  co  0  Reach (ITi)  and  co  0  Reach (W2)  and  each  player  gets  a 
payoff  0  for  the  path  co.  We  define  e-Nash  equilibrium  of  the  game  Gr  and 
relate  some  special  e-Nash  equilibrium  of  Gr  with  the  values  of  G. 

Definition  5  (e-Nash  equilibrium  in  Gr)  A  strategy  profile  ( t*,tt *)  G 
r  x  n  is  an  e-Nash  equilibrium  at  state  s  if  the  following  two  conditions 
hold: 

PrT* (Reach(W\))  >  sup Pr^:7r* (Reach(Wi))  —  e 
rer 

Prf* ,7T*  (Reach(W2))  >  sup  Prf* ,7r (Reach(W2))  —el 

7TEII 
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Theorem  3  (Nash  equilibrium  of  reachability  game  Gr)  The  fol¬ 
lowing  assertion  holds  for  the  game  Gr. 

1.  For  all  e  >  0,  there  is  an  e-Nash  equilibrium  (r*,7r*)  £  xll^(<]?) 

such  that  for  all  states  s  we  have 

lim  Pi-g6*’71^ (Reach(W\))  =  ((1  )}m;($)(s) 

£ — >0 

limPrTJ’7r*(Reach(W2))  =  ((2  ))m/($)(s). 

£ — >0 

Proof.  It  follows  from  Lemma  11.  I 

Note  that  in  case  of  MDPs  the  strategy  for  player  2  is  trivial,  i.e. ,  player  2 
has  only  one  strategy.  Hence  in  context  of  MDPs  we  drop  the  strategy  it  of 
player  2.  A  specialization  of  Theorem  3  in  case  of  MDPs  yields  the  following 
Theorem. 

Theorem  4  For  all  MDPs  Gm,  for  all  tail- objectives  <h,  we  have 


((1  ))vai{$)(s)  =  supPrTs(Reach(Wi))  =  (( l))vai(Reach(Wl))(s ) 
rer 

Since  the  values  in  MDPs  with  reachability  objectives  can  be  computed 
in  polynomial  time  (by  linear-programming)  [3,  9],  our  result  presents  a 
polytime  reduction  of  quantitative  analysis  of  tail  objectives  in  MDPs  to 
qualitative  analysis. 
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